EN FR
EN FR


Section: Scientific Foundations

Perceptual Modelling

Saliency, visual attention, cognition

The human visual system (HVS) is not able to process all visual information of our visual field at once. To cope with this problem, our visual system must filter out the irrelevant information and reduce redundant information. This feature of our visual system is driven by a selective sensing and analysis process. For instance, it is well known that the greatest visual acuity is provided by the fovea (center of the retina). Beyond this area, the acuity drops down with the eccentricity. Another example concerns the light that impinges on our retina. Only the visible light spectrum lying between 380 nm (violet) and 760 nm (red) is processed. To conclude on the selective sensing, it is important to mention that our sensitivity depends on a number of factors such as the spatial frequency, the orientation or the depth. These properties are modeled by a sensitivity function such as the Contrast Sensitivity Function (CSF).

Our capacity of analysis is also related to our visual attention. Visual attention which is closely linked to eye movement (note that this attention is called overt while the covert attention does not involve eye movement) allows us to focus our biological resources on a particular area. It can be controlled by both top-down (i.e. goal-directed, intention) and bottom-up (stimulus-driven, data-dependent) sources of information (L. Itti and C. Koch, “Computational Modelling of Visual Attention” , Nature Reviews Neuroscience, Vol. 2, No. 3, pp. 194-203, 2001.). This detection is also influenced by prior knowledge about the environment of the scene(J. Henderson, “Regarding scenes”, Directions in Psychological Science, vol. 16, pp. 219-222, 2007.). Implicit assumptions related to Prior knowledge or beliefs form play an important role in our perception (see the example concerning the assumption that light comes from above-left). Our perception results from the combination of prior beliefs with data we gather from the environment. A Bayesian framework is an elegant solution to model these interactions(L. Zhang, M. Tong, T. Marks, H. Shan, H. and G.W. Cottrell, “SUN: a Bayesian framework for saliency using natural statistics”,Journal of Vision, vol. 8, pp. 1-20, 2008.). We define a vector v l of local measurements (contrast of color, orientation, etc.) and vector v c of global and contextual features (global features, prior locations, type of the scene, etc.). The salient locations S for a spatial position x are then given by:

S(x )=1 p(v l v c )×p(s,x v c )(1)

The first term represents the bottom-up salience. It is based on a kind of contrast detection, following the assumption that rare image features are more salient than frequent ones. Most of existing computational models of visual attention rely on this term. However, different approaches exist to extract the local visual features as well as the global ones. The second term is the contextual priors. For instance, given a scene, it indicates which parts of the scene are likely the most salient.